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Let  X  a  (Xi » ♦ . . »Xk)  be  a  random  vector  whose  distribution  depends  on 
an  unknown  vector  parameter  _9  =  (6^,...,9k).  The  marginal  distribution  of 
depends  on  9^  only  ,  i  *  l,...,k.  This  paper  deals  with  the  problem  of 
selecting  the  largest  component  of  1  and  the  analogous  problem  of  selecting 
a  subset  of  the  components  of  a  which  includes  the  largest  component.  We 
consider  the  selection  problem  in  a  general  decision  theoretic  framework 
and  derive  Bayes  rules  for  selecting  the  largest  component.  The  Bayes  rules 
are  shown  to  have  certain  optimal  properties.  The  ordinary  selection  rules  are 

shown  to  be  Bayes  rules,  with  respect  to  a  special  loss  function. ^ - 

.  ‘  T  * 
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1.  Introduction.  Let  X  =  (X^»...,X^)  be  a  random  vector  whose  distribu¬ 
tion  depends  on  an  unknown  vector  parameter  e.  =  (e^ ... . ,9^).  The  marginal 
distribution  of  X.  depends  on  9^  only,  for  each  i  =  l,...,k.  We  consider 
the  problem  of  selecting  the  largest  component  of  Q  given  x,  an  observed  value 
of  X. 

There  are  two  formulations  of  the  selection  problem  which  have  been  gener¬ 
ally  considered  in  the  literature.  In  one  the  goal  is  to  select  the  largest 
component  with  a  "high"  probability.  In  the  other  the  goal  is  to  select  a 
subset  of  the  k  components  which  includes  the  largest  component  with  a  high 
probability  and  includes  any  of  the  remaining  components  with  a  "low"  probabil¬ 
ity.  In  the  second  case,  the  selection  would  be  correct  if  the  largest  component 
is  included  in  the  selected  subset. 

In  the  standard  formulation  of  the  selection  problem  a  minimum  probability 
is  pre-assigned,  equal  to  P*,  say,  such  that  the  probability  of  a  correct 
selection  (PCS)  should  be  at  least  as  large  as  P*.  This  is  called  the  P*- 
condition.  To  meet  the  P*-condition  it  needs  to  find  a  "least  favorable" 
configuration  (lfc)  of  the  parameter  space  for  which  the  PCS  is  minimized.  The 
lfc  is  found  easily  in  some  special  cases  which  have  been  considered  in  the 
literature  for  the  underlying  distribution  of  In  other  cases  the  minimi¬ 
zation  of  the  PCS  is  not  so  straightforward.  Consider,  for  example,  the  case 
where  X  is  distributed  according  to  a  multivariate  normal  distribution  with 
mean  _9  and  covariance  matrix  Z;  where  Z  is  known.  A  simple  rule  can  be  given 
for  selecting  the  largest  component  of  2  in  the  special  case  when  the  compo¬ 
nents  of  X  have  a  conmon  variance  and  are  equi -correlated  (see  Gibbons,  Olkin 
and  Sobel  (1977),  §15.2.1).  It  is  not  simple  to  find  an  optimal  selection 
rule  when  z  is  defined  more  generally.  The  difficulty  arises  even  in  the 
case  where  the  components  of  X  are  uncorrelated  but  they  have  unequal  variances. 
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For  this  case  various  rules  have  been  proposed  in  the  literature  for  selecting 
a  subset  which  includes  the  largest  component  of  9.  Berger  and  Gupta  (1980) 
have  examined  these  rules  and  compared  them,  applying  certain  criterion  of 
optimality. 

In  this  paper  we  consider  the  selection  problem  in  a  Bayesian  framework. 

The  Bayes  formulation  involves  the  specification  of  a  loss  function  and  the 
assumption  of  a  prior  distribution  for  the  parameter  _9.  Given  the  loss  function 
and  the  prior  distribution  of  9,  it  is  fairly  easy  to  find  an  optimal  selection 
rule.  The  optimal  rule  is  called  a  Bayes  rule.  The  Bayes  solution  does  not 
involve  the  minimization  problem  of  finding  the  least  favorable  configuration. 
Therefore,  at  least  from  the  point  of  view  of  mathematical  simplicity,  a  Bayes 
solution  of  the  selection  problem  should  be  more  attractive  than  the  standard 
method,  discussed  above. 

In  the  following  section  we  give  a  decision  theoretic  formulation  of 
the  selection  problem  and  derive  the  Bayes  solution  for  a  general  loss  function. 

We  illustrate  our  result  with  an  example  from  the  multivariate  normal  distribution. 

Berger  and  Gupta  (1980)  have  considered  a  monotonicity  property  for  an 
optimal  selection  rule.  A  rule  is  said  to  be  just  if  it  has  that  property. 

In  Section  3  we  show  that  our  Bayes  rules  are  just  if  certain  conditions 
with  regard  to  the  distribution  of  X  and  the  loss  function. are  met. 


lv 


* 
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2.  Bayes  selection  rules.  We  formulate  the  selection  problem  in  a 
decision  theoretic  framework,  as  follows:  A  rule  for  selecting  the  largest 
component  of  £  (selecting  a  subset  of  the  k  components,  which  includes  the 
largest  component)  is  given  by  a  vector  $  (x)  =  (^(x),...,$.  (x)),  where 
denotes  the  probability  that  the  ith  component  is  selected  (included 
in  the  selected  subset)  when  x  is  the  observed  value  of  X_.  For  the  problem 
of  selecting  the  largest  component  we  have 

k 

(2.1)  V  <Mx)  =  1  ,  V  x  . 

i=l  1 

First,  consider  the  subset  selection  problem.  We  call  it  Problem  I.  Let 

L . (o_)  denote.  The  loss  incurred  due  to  including  the  ith  component  in  the 

★ 

selected  subset,  and  let  L.(£)  denote  the  loss  due  to  eluding  the  ith  compon¬ 
ent  from  the  selected  subset.  The  total  loss  due  to  selecting  a  subset  ($) 
is  given  by 

Ic  k 

(2.2)  L(5,0)  *  l  $,1,(6)  +  l  (1-6.)L*(9) 

i=l  11  i=l  11 

3  l  MMi)  -  L*(D)  +  l  L*(e) 
i=l  1  1  1  i=l  1 

where  $  *  (6^,...,$^)  and  5^  *  1(0}  if  the  ith  component  is  included  in 
(excluded  from)  the  selected  subset.  We  assume  that 

k  * 

(2.3)  l  (Lf(9)  -  l,(8j)<  0  ,  V  0  . 

i-1  1  1 

The  above  inequality  implies  that  the  loss  due  to  including  all  the  components 
in  the  selected  subset  is  <  the  loss  due  to  excluding  all  the  components  from 
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the  selected  subset.  Therefore,  we  include  at  least  one  component  in  the 
selected  subset. 

Next,  consider  the  problem  of  selecting  the  largest  component.  We  call 
it  Problem  II.  Using  the  same  generic  notation  for  the  loss  function  as  in 
Problem  I,  we  let  L.(j3)  denote  the  loss  due  to  selecting  the  ith  component 


* 

as  the  largest  component,  and  let  (ji)  be  the  loss  due  to  not  selecting  the 
ith  component.  The  total  loss  due  to  selecting  a  component  (<$)  is  given  by 

(2.2),  where  now  6^  =  1(0)  if  the  ith  component  is  selected  (not  selected) 

for  the  largest  component. 

Consider  a  special  case  of  the  loss  function,  given  above.  Let 

f  0  lf  9i  =  9[k] 


(2.4)  L1<£>  a 


[  1  if  0i  f  0j-k-j 
L*  (9)  *  c(l-L.(0)) 


where  c  is  a  positive  number  and  9^  *  max  (6j ,6^).  We  let  c  >_  k-1  for 
Problem  I  and  c  =  1  for  Problem  II.  In  Problem  I  the  value  of  c  measures 
the  loss  due  to  excluding  the  largest  component  from  the  selected  subset, 
relative  to  the  loss  due  to  including  a  wrong  component  in  the  selected  subset. 
The  inequality  (2.3)  holds  since  c  k-1.' 

The  risk,  that  Is,  the  expected  loss  due  to  a  selection  rule  <p  *  4>(x) 
is  given  by 

k  k  * 

(2.5)  R.(£)  =  I  (LAi)  -  l*(8))  E  A.(x)  +  l  L*(0)  . 

♦  i«l  1  1  1  i*l  1 

In  Problem  I  the  risk  for  the  loss  function  given  by  (2. 4), is  equal  to  the 
sum  of  c(l-PCS)  and  the  expected  number  of  wrong  components  Included  in  the 
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selected  subset.  In  Problem  II  the  risk  for  the  same  loss  function  is  equal 
to  (1+c) (1-PCS) . 

Let  PQ  denote  the  conditional  distribution  of  X  given  _0.  We  suppose 
that  the  distribution  has  a  deni  sty  pa(x)  with  respect  to  a  a-finite  measure 

b 

u  on  R  .  For  the  Bayes  formulation  of  the  selection  problem  we  assume  that 
9.  is  distributed  a1  priori  according  to  a  probability  distribution  G,  say. 
The  optimal  selection  rule  is  a  functional  $  which  minimizes  the  average  of 
the  risk  function  with  respect  to  the  given  prior  distribution  of  e,  given 


by 

Let 

(2.6) 


%  -  {  %<4)  dS(0). 

M^x)  -  /  (L. (6)  -  L*(6))  pQ  (x)  dG(0) 


(2.7)  M(x)  =  min  (M,(x)  ,...,  M^x))  . 


By  virtue  of  (2.5)  a  Bayes  rule  for  Problem  I  is  given  by 
'1  if 

(2.8)  <j>.(x)  =  • 

0  otherwise 


The  Bayes  rule  for  Problem  II  is  given  by 
’1  if  Mf(x)  *  M(x) 

(2.9)  ^(x)  *  ■ 

0  otherwise 

If  M^x)  ■  M(x)  for  several  values  of  i,  we  select  the  smallest  among  the 
tied  values  of  1  for  the  largest  component.  We  note  that  the  Bayes  rules 
(2.8)  and  (2.9)  are  both  non- randomized  selection  rules. 
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We  illustrate  our  results  by  the  following  example. 

Example.  Let  X  be  distributed  according  to  a  multivariate  normal  distri¬ 
bution  N(j0,e),  where  the  covariance  e  is  a  diagonal  matrix,  the  ith  element 

2 

on  the  diagonal  being  denoted  by  a..  Let  the  loss  function  be  given  by  (2.4) 
and  let  _9  be  distributed  a' priori  according  to  N(Q,  t  I),  where  I  denotes 
an  identity  matrix.  Let  <|>(x)  and  $(x)  denote  the  standard  normal  density 
and  cdf,  respectively,  and  let 


/  ( t2+o2 )  . 


By  direct  computation  we  get 


(2.10)  M.(x)  =  p(x)  [l-O+c) 


k 

n 


X.o. 
t  (- 


j=l  (j^i)  Xj°j 


2  2 
X“x.-\*x. 

u  + - r  ^'^)4>(u)du] 

J°j 


where 

P(x.)  =  jp0(x)  dG(9) 


=  (_n^  (2ir(a2+T2))"i)  exp  (-  \  j^x2  /  (o2+t2)) 


denotes  the  marginal  density  of  X. 

If  we  let  t  +  00 ,  so  that  the  prior  distribution  of  Q  tends  to  be  non- 
informative,  then 


(2.11)  M.(x)  -  p(x)  [l-O+c)  f  n  $  (~  u  +  (u)  du]  . 

1  °j  °j 


Hence,  the  ith  component  is  included  in  the  selected  subset  if 


f®  k  ai  i 

(2.12)  n  Mr1  u  +  -±-±)  *  (u)  du  >  . 

J-j»1(jjM)  °j  aj  -1+c 
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The  above  inequality  holds  for  sufficiently  large  values  of  (x—x.)/a.  ,  j^i. 

'  J  J 

In  Problem  II  we  select  the  rth  component  if  the  quantity  on  the  left  side 
of  (2.12)  is  maximized  for  i=r.  For  k=2  this  quantity  is  equal  to 

«  (  (vxj) 1  • 

Therefore,  we  select  the  component  associated  with  the  larger  of  the  two  values 
x-|  and  Xg. 

Let  a-j  =  ...  =  ok  =  a,  say,  giving  Aj  =  ...  =  A^  *  X,  say.  We  have 

r  00 

(2.13)  M.(x)  »  p°(x)  [l-(l+c)  |  n  *  (u  (x.-x.))  4>(u)  du] 

where  p°(x)  is  obtained  from  p  (x)  by  substituting  a  for  oi  ,  i  =  l,...,k. 

We  find  that  M^(x)  £  0  for  x..-Xj  £  0  , 

j  =  l,...,i-l,  i+l,...,k.  Therefore,  we  include  the  ith  component  in  the 

selected  subset  if  none  of  these  differences  is  negatively  large.  Also,  M^(]c) 

is  minimized  for  the  value  of  i  associated  with  the  largest  component  of  x. 

Therefore,  in  Problem  II  we  select  the  component  associated  with  the  largest 

value  among  x^,...,x^  .  These  are  ordinary  selection  rules.  We  see  them  as 
Bayes  rules. 

3.  Just  rules.  First  we  define  a  stochastically  increasing  property 
(SIP)  of  a  class  of  multivariate  distributions.  A  set  A  CR  is  said  to  be 

monotone  if,  if  x  e  A  and  y^  £  x^  ,  i»I,...,k  then  £  e  A.  Let  PQ  be  a 

I# 

family  of  probability  distributions  on  R  indexed  by  a  vector  parameter 
9  *  (e.|,...,ek).  Let  fiC  R  denote  the  parameter  space.  The  family  of  distribu¬ 
tions  P q  is  said  to  have  SIP  with  respect  to  0,  if  ,  if  9.  e  -  ,  J3'  e  fi  and 

®ji6}  »  i  *  1,...,  k  then  PQ(A)  <  PQ , (A)  for  all  monotone  sets  A.  A  character- 


t 
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ization  of  the  SIP,  due  to  Lehmann  (1955),  is  given  as  follows:  A  function 
is  said  to  be  nondecreasing  in  x  if,  if  x.  <  x!  ,  i  =  1,...,k  then 
t|j{x)  <  Mx').  A  family  of  distributions  PQ  is  said  to  have  SIP  with  respect 
to  0  if  and  only  if,  if  i^(x)  is  nondecreasing  in  x  then  d  Pg(x)  is 

nondecreasing  in  2* 

Now  we  define  a  just  rule.  A  selection  rule  <}>  is  said  to  be  just  if 
4>.(x)  is  nondecreasing  in  x.  and  nonincreasing  in  x.  (j^i)  for  i,  j  =  1 . k. 

i  I  J 

Theorem  1  below,  shows  that  the  Bayes  rules  given  by  (2.8)  and  (2.9)  are  just 
if  the  following  assumptions  are  valid. 

Assumption  1  -  The  posterior  distribution  of  9_  given  x.,  has  SIP  with 
respect  to  x. 

Assumption  2  -  The  function  (6)  -  L*(eJ  is  nonincreasing  in  e.  and 
nondecreasing  in  6 .  ( jy*f ) . 

Assumption  3  -  M.  (x)  is  a  continuous  function  of  x. 

Assumption  3  is  valid  if,  for  example,  the  loss  functions  and  Lt(£) 

are  bounded  and  p0 (jc)  is  continuous  in  x  uniformly  for  9_  e  £2. 

Theorem  1 .  If  assumptions  1  and  2  hold  then  the  Bayes  rule  (2.8)  is 
just.  If  moreover  Assumption  3  holds  then  the  Bayes  rule  (2.9)  is  just. 

Proof:  From  the  characterization  of  the  SIP  given  above,  and  Assumptions 
1  and  2,  it  follows  that  M^(j<)  as  given  by  (2.6),  is  nonincreasing  in  x^  and 
nondecreasing  in  x.(jjM).  Therefore,  the  function  $.(x)  as  given  by  (2.8), 

J  * 

is  nondecreasing  in  x.  and  nonincreasing  in  x.(j^i).  Hence,  the  Bayes  rule 

•  J 

(2.8)  is  just  if  Assumptions  1  and  2  hold.  If  moreover.  Assumption  3  holds 
then  it  follows  from  the  continuity  and  monotonicity  property  of  M.(x.)  that 
^.(x),  9iven  by  (2.9),  is  nondecreasing  in  x..  and  nonincreasing  in  Xj(j^i). 
Hence,  the  Bayes  rule  (2.9)  is  just.  □ 
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In  the  application  of  Theorem  1  it  would  be  interesting  to  find  for  a 

given  family  of  distributions  which  is  stochastically  increasing  in  _9, 

the  family  of  prior  distributions  for  which  Assumption  1  holds.  We  have  not 

investigated  this  problem  to  any  length.  We  discuss  below  some  cases  in  which 

an  appropriate  prior  distribution  can  be  found  for  which  Assumption  1  holds. 

Clearly,  P,(x)  has  SIP  with  respect  to  9  if  6  is  a  location  parameter 

of  the  conditional  distribution.  The  posterior  distribution  of  £  with  respect 

k 

to  a  non- informative  prior  distribution  G  which  is  uniform  on  R  ,  has  SIP 
with  respect  to  x.  Similarly,  Pg(x)  has  SIP  with  respect  to  _9  if  9_  is  a  scale 
parameter  (component-wise)  of  the  conditional  distribution.  The  posterior 
distribution  of  _9  with  respect  to  the  non-informative  prior  on  R  with  density 
function  g(9j  •*  (9-J....9.  )  has  SIP  with  respect  to  x. 

The  prior  distributions  *o r  the  location  and  scale  parameters  considered 
above,  which  lead  to  the  posterior  distribution  with  SIP  are  both  improper 
distributions.  We  give  now  an  example  of  a  proper  prior  distribution  for 
a  location  parameter.  Let  Pg  denote  the  multivariate  normal  distribution 
N(q,z)  and  let  9_  be  distributed  a1  priori  according  to  N(0,P).  A1  posteriori , 

9.  is  distributed  according  to  the  normal  distribution  N (A  x  ,  ^), 

where 

A  =  (I-1  +  n'1)'1  z"1. 

If  the  elements  of  A  are  non-negative  then  the  posterior  distribution  of  9_ 

has  SIP  with  respect  to  x.*  The  elements  of  A  are  positive  if  the  covariance 

matrices  E  =  (a^j)  and  p.  =  (w^)  are  given  by  =  1  ,  =  p  (i7j),  p  <  0, 

aj ...  =  7  and  tJj. .  =  0  (i^j).  The  matrix  A  is  given  by 
iJ  A  i  J 

A  .  (HA(l-P)-1)-'  (I  -  j  > 


where  I  denotes  the  identity  matrix  and  J  denotes  a  matrix  whose  elements 
are  each  equal  to  1 . 


\ 
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